Last updated: 2017-03-05
Approachable:
Bayesian:
MCMC:
Technical:
Very Important:
Moderately Important:
Minimally Important:
Our tests (so far) assume independence of samples.
Lots of data have structure at additional levels beyond the main variable(s) of interest
We need a method to model that correlation structure.
Explicitly by design:
Implicitly as a side effect:
Multilevel / hierarchical / mixed / repeated measures models
Fixed effects:
Random effects
Also see: http://andrewgelman.com/2005/01/25/why_i_dont_use/
Units are grouped at different levels.
\[Y = \beta_0 + \beta_1 X_1 + ... + \beta_n X_n + \mbox{Random}\]
Where \(\mbox{Random}\) can be:
Each bird gets its own mean value for a maneuver (across 5 trials). Maneuver is analyzed as the mean difference between each maneuver.
Conceptual complexity
Computational complexity
Among the options for R packages:
nlme: Oldest, restricted to a subset of models (linear and non-linear Gaussian), for us the place to startlme4: Newer (but same authors as nlme), can handle more complex models with crossed random effects, pedigrees, non-Gaussian response, etc., more flexible distributions of response variables, but harder to test hypothesesMCMCglmm, rstan, rjags: Fit Bayesian multilevel models via a variety of MCMC samplersSeasonal patterns in testis mass in the squid Loligo. Testis mass predicted by dorsal mantle length, month, and DML X month.
Squid dataS <- read_excel("../data/Squid.xlsx")
S <- S %>% mutate(Month = factor(Month))
str(S)
## Classes 'tbl_df', 'tbl' and 'data.frame': 768 obs. of 5 variables: ## $ Specimen : num 1017 1034 1070 1070 1019 ... ## $ Year : num 1991 1990 1990 1990 1990 ... ## $ Month : Factor w/ 12 levels "1","2","3","4",..: 2 9 12 11 8 10 5 7 7 7 ... ## $ DML : num 136 144 108 130 121 117 133 105 109 97 ... ## $ Testis_Mass: num 0.006 0.008 0.008 0.011 0.012 0.012 0.013 0.015 0.017 0.017 ...
# Check for missing data sum(!complete.cases(S))
## [1] 0
Testis mass modeled by DML, Month, and the DML X Month interaction:
fm1 <- lm(Testis_Mass ~ DML + Month + DML:Month, data = S)
Extract the residuals and fitted values:
S$Res1 <- residuals(fm1) S$Fitted <- fitted.values(fm1)
What we need:
DML (here, more spread with larger values)Solution:
nlme::gls()library(nlme) variance_struct <- varPower(form = ~ DML | Month)
DML on a per-Month basis.fm2 <- gls(Testis_Mass ~ DML * Month, data = S,
weights = variance_struct)
plot(fm2)
library(car) Anova(fm2, type = "III")
## Analysis of Deviance Table (Type III tests) ## ## Response: Testis_Mass ## Df Chisq Pr(>Chisq) ## (Intercept) 1 12.830 0.0003411 *** ## DML 1 97.686 < 2.2e-16 *** ## Month 11 131.398 < 2.2e-16 *** ## DML:Month 11 263.769 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
S$Res2 <- residuals(fm2, type = "normalized")
## coef(fm2) ## (Intercept) -4.780826964 ## DML 0.046586983 ## Month2 1.334893906 ## Month3 3.661220332 ## Month4 3.410126677 ## Month5 3.953042298 ## Month6 -0.666390313 ## Month7 3.669116396 ## Month8 3.241297524 ## Month9 -1.646885027 ## Month10 0.520272100 ## Month11 -1.742198739 ## Month12 -0.957170988 ## DML:Month2 -0.007741310 ## DML:Month3 -0.020683250 ## DML:Month4 -0.023433233 ## DML:Month5 -0.028570828 ## DML:Month6 -0.012384582 ## DML:Month7 -0.036971343 ## DML:Month8 -0.036261427 ## DML:Month9 -0.002745741 ## DML:Month10 -0.008962055 ## DML:Month11 0.007326031 ## DML:Month12 0.007205327
Data with hierarchical structure:
We could just take a means of each block and use those data for ANOVA.
Collected by National Institute for Coastal and Marine Management (RIKZ)
RIKZ <- read_excel("../data/RIKZ.xlsx")
glimpse(RIKZ)
## Observations: 45 ## Variables: 5 ## $ Sample <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16... ## $ Richness <dbl> 11, 10, 13, 11, 10, 8, 9, 8, 19, 17, 6, 1, 4, 3, 3, 1... ## $ Exposure <dbl> 10, 10, 10, 10, 10, 8, 8, 8, 8, 8, 11, 11, 11, 11, 11... ## $ NAP <dbl> 0.045, -1.036, -1.336, 0.616, -0.684, 1.190, 0.820, 0... ## $ Beach <dbl> 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4,...
Richness: Number of speciesNAP: Height above mean tidal levelExposure: Index composed of wave action, length of the surf zone, slope, grain size, and the depth of the anaerobic layerConvert Beach to factor.
RIKZ <- RIKZ %>% mutate(Beach = factor(Beach))
Sites are the level of the observation.
RIKZ %>% group_by(Beach) %>% tally()
## # A tibble: 9 × 2 ## Beach n ## <fctr> <int> ## 1 1 5 ## 2 2 5 ## 3 3 5 ## 4 4 5 ## 5 5 5 ## 6 6 5 ## 7 7 5 ## 8 8 5 ## 9 9 5
5 observations at each of 9 beaches
NAPExposureEach Beach only has 1 value of Exposure.
fm1 <- lm(Richness ~ NAP + Beach, data = RIKZ) Anova(fm1, type = "III")
## Anova Table (Type III tests) ## ## Response: Richness ## Sum Sq Df F value Pr(>F) ## (Intercept) 466.37 1 49.8039 3.224e-08 *** ## NAP 230.66 1 24.6322 1.793e-05 *** ## Beach 416.37 8 5.5581 0.0001441 *** ## Residuals 327.74 35 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Maybe we are not interested in knowing the exact nature of the beach effect.
Allow a different mean richness per beach:
fm2 <- lme(Richness ~ NAP, random = ~ 1 | Beach, data = RIKZ)
plot(fm2)
## Linear mixed-effects model fit by REML ## Data: RIKZ ## AIC BIC logLik ## 247.4802 254.525 -119.7401 ## ## Random effects: ## Formula: ~1 | Beach ## (Intercept) Residual ## StdDev: 2.944065 3.05977 ## ## Fixed effects: Richness ~ NAP ## Value Std.Error DF t-value p-value ## (Intercept) 6.581893 1.0957618 35 6.006682 0 ## NAP -2.568400 0.4947246 35 -5.191574 0 ## Correlation: ## (Intr) ## NAP -0.157 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -1.4227495 -0.4848006 -0.1576462 0.2518966 3.9793918 ## ## Number of Observations: 45 ## Number of Groups: 9
Generate predicted values for overall population level estimates and within beach estimates.
# Overall RIKZ$Pred0 <- fitted(fm2, level = 0) # Within beach RIKZ$Pred1 <- fitted(fm2, level = 1)
Allow each Beach to have its own NAP to Richness linear relationship:
fm3 <- lme(Richness ~ NAP, random = ~ NAP | Beach, data = RIKZ)
Now the NAP to Richness relationship can vary for each Beach.
Model formula for random effects:
\[\sim \mbox{Slope}~|~\mbox{Intercept}\]
summary(fm3)
## Linear mixed-effects model fit by REML ## Data: RIKZ ## AIC BIC logLik ## 244.3839 254.9511 -116.1919 ## ## Random effects: ## Formula: ~NAP | Beach ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 3.549064 (Intr) ## NAP 1.714963 -0.99 ## Residual 2.702820 ## ## Fixed effects: Richness ~ NAP ## Value Std.Error DF t-value p-value ## (Intercept) 6.588706 1.264761 35 5.209448 0e+00 ## NAP -2.830028 0.722940 35 -3.914610 4e-04 ## Correlation: ## (Intr) ## NAP -0.819 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -1.8213326 -0.3411043 -0.1674617 0.1921216 3.0397049 ## ## Number of Observations: 45 ## Number of Groups: 9
Generate predicted values for overall population level estimates and within beach estimates.
# Overall RIKZ$Pred0 <- fitted(fm3, level = 0) # Within beach RIKZ$Pred1 <- fitted(fm3, level = 1)
## (Intercept) NAP ## 1 8.421057 -3.656268 ## 2 12.363497 -5.536814 ## 3 3.806643 -1.505716 ## 4 3.562422 -1.385960 ## 5 11.200150 -5.137323 ## 6 4.426279 -1.775675 ## 7 4.082944 -1.644329 ## 8 5.099895 -2.106852 ## 9 6.335471 -2.721314
Sequential measurements of the same individual / plot / organism, etc. will result in within-subject correlation.
Distance from the sella turcica to the pterygomaxillary fissure in growing children.
O <- read_excel("../data/Ortho.xlsx")
O <- O %>% mutate(Subject = factor(Subject),
Sex = factor(Sex))
glimpse(O)
## Observations: 108 ## Variables: 4 ## $ Distance <dbl> 26.0, 25.0, 29.0, 31.0, 21.5, 22.5, 23.0, 26.5, 23.0,... ## $ Age <dbl> 8, 10, 12, 14, 8, 10, 12, 14, 8, 10, 12, 14, 8, 10, 1... ## $ Subject <fctr> M01, M01, M01, M01, M02, M02, M02, M02, M03, M03, M0... ## $ Sex <fctr> Male, Male, Male, Male, Male, Male, Male, Male, Male...
O %>% group_by(Sex) %>% tally()
## # A tibble: 2 × 2 ## Sex n ## <fctr> <int> ## 1 Female 44 ## 2 Male 64
range(O$Age)
## [1] 8 14
Potthoff, R. F. and Roy, S. N. (1964) A generalized multivariate analysis of variance model useful especially for growth curve problems. Biometrika 51: 313-326.
Age and Sex (like an ANCOVA model).Age by Distance relationship (slope) to vary for each Subject, which is also allowed to have its own mean.fm2 <- lme(Distance ~ Sex + Age, random = ~ Age | Subject,
method = "ML",
data = O)
# Overall O$Pred0 <- fitted(fm2, level = 0) # Age within Subject O$Pred1 <- fitted(fm2, level = 1)
## Linear mixed-effects model fit by maximum likelihood ## Data: O ## AIC BIC logLik ## 446.8352 465.6101 -216.4176 ## ## Random effects: ## Formula: ~Age | Subject ## Structure: General positive-definite, Log-Cholesky parametrization ## StdDev Corr ## (Intercept) 2.6447350 (Intr) ## Age 0.2149243 -0.76 ## Residual 1.3100397 ## ## Fixed effects: Distance ~ Sex + Age ## Value Std.Error DF t-value p-value ## (Intercept) 15.489709 0.9328629 80 16.604486 0.0000 ## SexMale 2.145491 0.7391991 25 2.902453 0.0076 ## Age 0.660185 0.0709131 80 9.309772 0.0000 ## Correlation: ## (Intr) SexMal ## SexMale -0.470 ## Age -0.792 0.000 ## ## Standardized Within-Group Residuals: ## Min Q1 Med Q3 Max ## -3.16563246 -0.45463448 0.01446427 0.44559465 3.90045094 ## ## Number of Observations: 108 ## Number of Groups: 27
Anova(fm2, type = "III")
## Analysis of Deviance Table (Type III tests) ## ## Response: Distance ## Chisq Df Pr(>Chisq) ## (Intercept) 283.5863 1 < 2.2e-16 *** ## Sex 8.6649 1 0.003244 ** ## Age 89.1482 1 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Adapted from McElreath (2015)
No.
Watch this lecture again for review.
Watch Lecture 08-4
Bolker, B. M., M. Brooks, C. J. Clark, S. Geange, J. Poulsen, M. Stevens, and J. White. 2009. Generalized Linear Mixed Models: A Practical Guide for Ecology and Evolution. Trends in ecology & evolution 24:127–135.
Hadfield, J. D. 2010. MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package. Journal of statistical software 33:1–22.
McElreath, R. 2015. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Chapman; Hall/CRC, Boca Raton, FL.
Pinheiro, J. C., and D. M. Bates. 2000. Mixed-Effects Models in S and S-PLUS. Springer.
Zuur, A. F., E. N. Ieno, N. J. Walker, A. A. Saveliev, and G. M. Smith. 2009. Mixed Effects Models and Extensions in Ecology with R. Springer, New York.